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ABSTRACT 


Since their publication in 2016 we have seen a rapid adoption of the FAIR principles in many scientific 
disciplines where the inherent value of research data and, therefore, the importance of good data management 
and data stewardship, is recognized. This has led to many communities asking “What is FAIR?” and “How 
FAIR are we currently?”, questions which were addressed respectively by a publication revisiting the 
principles and the emergence of FAIR metrics. However, early adopters of the FAIR principles have already 
run into the next question: “How can we become (more) FAIR?” This question is more difficult to answer, as 
the principles do not prescribe any specific standard or implementation. Moreover, there does not yet exist 
a mature ecosystem of tools, platforms and standards to support human and machine agents to manage, 
produce, publish and consume FAIR data in a user-friendly and efficient (i.e., “easy”) way. In this paper we 
will show, however, that there are already many emerging examples of FAIR tools under development. This 
paper puts forward the position that we are likely already in a creolization phase where FAIR tools and 
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technologies are merging and combining, before converging in a subsequent phase to solutions that make 
FAIR feasible in daily practice. 


1. INTRODUCTION 


At a glance, the FAIR principles simply stipulate a number of “best practices” on how to deal with data 
and their associated metadata. However, a more careful reading of both the principles and their associated 
publications [1, 2] reveals some of the potential complexities when trying to implement FAIR [3]. These 
issues break down in at least three specific, orthogonal aspects. Firstly, a number of principles provide 
guidelines about the relationship between data, the representation of the data and the associated metadata 
that describes the data more fully (e.g., F1, F2, F3, 11, 12, 13, R1, R1.1, R1.2, R1.3). Even though it is clear 
what is required by these principles, it is not specified how it should be done, i.e., FAIR is not, in itself, a 
standard [2]. Secondly, there are a number of principles that require extensive infrastructural support like 
search engines, communication protocols and identifier resolution services (e.g., F4, A1, A2). Thirdly, there 
are a number of principles that refer to a community consensus or standard either explicitly (R1.3 and, by 
recursion, 12) or implicitly, concerning for example the definition of “rich”, “shared” and “relevant” (F2, 11, 
R1). Moreover, the principles are open to interpretation with regard to the type of digital resource and its 
granularity. For example, when a principle talks about “data” does it refer to a data set as a whole, or could 
it refer to each individual data record (or item) contained in the data set? Finally, the principles need to be 
taken as guidelines that primarily aim to enable machines to (autonomously) interact with data [1], thus 
adding another possible layer of interpretation and implementation complexity. 


In this paper we will consider which tools and technologies are currently available and which functionality, 
to the best of our knowledge, is still lacking to support stakeholders in each step from FAIR Data management 
planning to FAIR data creation, publication, evaluation and (re)use. As authors we have also developed 
such tools in recent years and we include them here in order to illustrate possible solutions and highlight 
open issues. A full and comprehensive review of relevant tools and technologies is out of scope for this 
paper, but references in this paper are available as a community-editable Wiki page [4] and we welcome 
contributions there in order to increase awareness of existing efforts and to facilitate technological 
creolization [5] and convergence. 


2. FAIR DATA MANAGEMENT PLANNING 


With the increase of data-driven research and the rising importance of digital research objects and other 
digital artifacts [6], e.g., for the purpose of reuse and reproducibility [7], there is more need than ever for 
researchers to follow proper data management procedures. Moreover, researchers are increasingly required 
to provide a Data Management Plan (DMP) that meets the requirements as set out by different funding 
organizations [8] and serves as an adaptable, guiding document of the data management process during 
the project. A large number of DMP tools have emerged to assist researchers to create and maintain DMPs. 
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The main challenge for a DMP tool is to efficiently transfer knowledge regarding the many organizational, 
procedural and technical aspects of data management and data stewardship to an audience of researchers 
from different backgrounds and domains in order to produce an application- and domain-relevant DMP 
and to maximize opportunities for good data handling and reuse during and after the project. Many of these 
tools use the FAIR guiding principles for data management, but do so in a variety of ways. Here we take a 
look at two examples: DMPOnline [9] and the Data Stewardship Wizard (DSW, [10]), for a more complete 
discussion, please see [8]. DMPOnline has recently seen rapid adoption from researchers and organizations 
as the go-to tool to produce funder-compliant DMPs. It provides an online, collaborative environment with 
(mostly) open text forms divided into sections following a configurable funder’s DMP template. For each 
section, DMPOnline embeds explanatory text from a configurable set of sources, which may be DMP 
guidelines from funding organizations or academic institutions and may (or may not) contain FAIR-specific 
guidance. In contrast, the DSW tool guides the user through a comprehensive, “FAIR-aware” data 
management knowledge model by asking a number of multiple-choice questions with embedded book 
excerpts for additional explanation [11]. This organization allows DSW to very efficiently point the user to 
the relevant data stewardship issues, tools and other resources by omitting the parts from the larger 
knowledge model that would only apply to other cases. DSW also facilitates automatic evaluation of the 
questions, for example in order to produce FAIRness metrics or other evaluation score. In the future we are 
likely going to see a continuation of efforts toward machine actionable DMPs and tooling®, thus enabling 
DMP interoperability, exchange and (semi-)automatic evaluation of (parts of) the reported data management 
process. Interestingly, the FAIR metrics (see last section) share similar objectives, which suggest that DMP 
and FAIR metrics tools may be destined for co-evolution in the future. 


3. FAIR DATA PRODUCTION 


One of the main challenges following from the FAIR guidelines is that they propose a number of attributes 
to be associated with the data: unique identifiers [12], (qualified references to) rich metadata, use of 
vocabularies, provenance, etc. The value of these attributes to any downstream data consumer (be it a 
human or machine agent) is quite clear, but can also pose a burden on the data producer. We foresee the 
emergence of a category of tools that support data producers to make sure the data contain the required 
attributes. These “FAIRifier” tools may come in many different flavors: supporting either generic or domain- 
specific use cases, FAIRifying at the source or post-hoc, targeting different end-users (e.g., data scientists or 
data stewards), using different technologies (e.g., semantic Web technology) and supporting (semi)automated 
or manual workflows. 


We have developed a general-purpose FAIRifier on the basis of the OpenRefine data cleaning and 
wrangling tool [13,14,15] and the RDF plugin®. This FAIRifier enables a post-hoc FAIRification workflow: 
load an existing data set (from a wide range of formats), (optionally) perform data wrangling tasks, add FAIR 


® DMP Common Standards WG. Available at: https://www.rd-alliance.org/groups/dmp-common-standards-wg. 
® OpenRefine RDF plugin. Available at: https://github.com/stkenny/grefine-rdf-extension. 
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(metadata) attributes to the data, generate a linked data version of the data and, finally, push the result to 
an online FAIR data infrastructure to make it accessible and discoverable. Literal values in a data set can 
be replaced by identifiers (URLs) either manually, by semi-automatic mapping to pre-loaded ontologies 
(using the OpenRefine reconciliation function) or by embedded, customizable script expressions. The 
interoperability of the data set can be improved by connecting these identifiers into a meaningful semantic 
graph-structure (model) of ontological classes and properties using the integrated RDF model editor. A 
provenance trail automatically keeps track of each modification and additionally enables “undo” operations 
and repetition of operations on similar data sets. A FAIR data export function opens up a metadata editor 
to provide information about the data set itself: title, publisher (author), license, and a range of additional 
optional metadata. 


Future development plans include features to make the FAIRifier easier to use for non-technical users. 
This includes functionality to suggest transformations and (semi)automatic application of graph models 
based on libraries of ontologies and graph models created by other (expert) users. Many other tools have 
demonstrated FAIRification capabilities with different benefits and limitations. To name a few: Karma® offers 
a user-friendly interface and automatic model selection capability that are not available in the OpenRefine- 
based FAIRifier, but lacks some of its other features. RightField [16] and Ontomaton [17] transparently 
integrate FAIRification to end-users by pre-configuring spreadsheet applications with a semantic data model. 
The different concepts and functionalities offered by these tools are all worth further evaluation and 
development in the context of creating a rich ecosystem of FAIRifier tools. Finally, note that the tools 
mentioned in this section and the next, adopt ontologies [18] and linked data [19]. These technologies align 
very well with a number of FAIR principles “out of the box”, but other tools may choose a different core 
technology for their implementation. 


4. PUBLISHING FAIR DATA 


Data coming from a FAIRifier can still not be considered fully FAIR and machine actionable, unless they 
have been published to, or otherwise made available via the Internet. Here we focus mainly on the 
principles collected under the “A” and related infrastructural aspects, for issues regarding Findability of FAIR 
data sets, please see the last section. Arguably, the main challenge regarding Accessibility is to make every 
part of the access process machine actionable, so that machines are enabled to automatically negotiate 
access (based on conditions set by the data owner) and to retrieve data and metadata in order to (semi) 
automatically evaluate their fitness for purpose. Part of this problem relates to the representation of 
accessibility conditions and their organizational, regulatory or legal framework [20, 21, 22]. Another part 
requires specific support from the infrastructure, i.e., if conditions permit access, the infrastructure should 
allow data consumers to get to the data in a straightforward, predictable way. This means choosing between 
a large number of protocols and APIs and their respective standards and conventions. 


® Karma: A data integration tool. Available at: http://usc-isi-i2.github.io/karma/. 
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We have developed the concept of a FAIR Data Point (FDP) [23] with a dual, ongoing goal: 1) to 
demonstrate comprehensive compliance to the FAIR principles and metrics and 2) as a light-weight 
infrastructural component and standard that may be used by existing repositories and infrastructures. 
Primary design objectives to support these goals were to require only minimal (but extensible) semantic 
descriptions and to adopt a light-weight interface. An FDP serves relevant, FAIR metadata as RDF over a 
simple RESTful API [24] on five different hierarchical layers starting with metadata about the FDP itself, 
followed by Catalogs, Data sets, Distributions and, finally, record-level metadata. Its metadata is mainly 
based on the widely used DCAT® and Dublin Core® standards, with minor extensions to comply with FAIR 
principles (detailed in the FDP specification document®). Given a FDP URL, a DCAT-aware REST client can 
automatically traverse the FDP hierarchy down to the level of actual data records. Traversal may be directed 
by the client's evaluation of the metadata (e.g., for relevance) or may be halted by the FDP if access 
restrictions for that level apply. We intend to use the FDP in combination with more refined, currently 
emerging semantic models to describe access conditions (e.g., based on consent and GDPR regulations) 
and integration with an Authorization and Authentication Infrastructure for applications in the health 
domain [25]. There are a number of other standards (most notably Linked Data API®, Hydra® and Linked 
Data Platform®) that provide more sophisticated descriptions to the client about API state transitions and 
additional API functionality such as querying. We consider these efforts complementary to the FDP and 
combinations are likely possible. We are currently evaluating in which scenarios such combinations would 
offer additional benefit before extending the FDP core functionality accordingly. 


5. EVALUATING THE FAIRNESS OF A RESOURCE 


An emerging consideration for the different stakeholders involved in FAIR activities is the assessment of 
the FAIRness level of resources. It is often useful to assess to which extent a resource (data or metadata) 
follows the FAIR principles. This assessment can help evaluate if initial goals for the resource have been 
achieved and also can help identify desirable points for improvements. A number of different initiatives are 
currently working on defining frameworks, methods and criteria for evaluating FAIRness. Initiatives include 
the FAIR Metrics Group®, the RDA FAIR Data Maturity Model Working Group®, the NIH Data Commons 
Pilot Phase Consortium® and others and they are mostly ongoing efforts. Nevertheless, a number of online 
evaluation tools and forms have become available [26, 27, 28, 29, 30], which illustrates the perceived 
importance of helping users to measure theirs or other people’s FAIRness in all phases of the data life cycle. 


https:/\www.w3.org/TR/vocab-dcat/. 

http://dublincore.org/. 
https://github.com/FAIRDataTeam/FAIRDataPoint-Spec/blob/development/spec.md. 
https://github.com/UKGovLD/linked-data-api/blob/wiki/Specification.md. 
https://www.hydra-cg.com/spec/latest/core/. 

https://www.w3.org/TR/Idp/. 

http://www. fairmetrics.org/. 
https://rd-alliance.org/groups/fair-data-maturity-model-wg. 
https://commonfund.nih.gov/commons/awardees. 
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For instance, the aforementioned Data Stewardship Wizard incorporated in its knowledge model metrics 
from the FAIR Metrics Group so that the user can have an indication of the FAIRness level that is expected 
from the yet to be created data. After data creation, another evaluation can be performed to measure the 
achieved FAIRness level, and if necessary, a review of the plan can be made to mitigate any problems [31]. 


6. FINDING AND (RE)USING FAIR DATA 


Arguably, efficient use and reuse of data is a major objective of the FAIR guiding principles. Consider 
an ideal digital world where all data are FAIR: machine agents should then be able to (autonomously) 
execute a process or workflow to find (principles F) and access (A) any available, relevant data sources and 
automatically integrate, query and reason over the interoperable (I) data toward a useful result to a problem 
formulated by either human users or indeed other machine agents. It may seem therefore that reusability 
(R) is trivially solved if resources fully comply to F, A and | principles and infrastructure exists to support 
it. However, we would argue that without due consideration of the principles under R, the data would still 
not be very (re)usable and that the effects and requirements of the R principles permeate through to all the 
other principles, all steps in the data life cycle, as well as any FAIR supporting infrastructures and tools. 
Let's for example look at the step of finding relevant data, a problem for which many technical solutions 
exist, even those exhibiting certain FAIR characteristics. This includes for example the FAIR data search 
engine prototype, which harvests FDP metadata, indexes it and offers a search UI and API for human and 
machine searches, respectively [32]. An alternative approach uses structured embedded metadata which 
may be crawled and indexed by existing online search services: for example, a Web page related to a data 
set could contain structured “Data set” metadata® and would allow the data set to show up in the Google 
Data set search® service. Hybrid approaches are also possible: for example the FDP includes a simple UI 
that embeds schema.org metadata. Even as there appears to be sufficient infrastructure to support 
“Findability”, the data that are found will not actually be usable if the metadata does not specify the legal 
conditions under which it may be used (R1.1), if the origin, relevance and trustworthiness of the data is not 
clear (R1.2) or if it does not follow standards relevant for a given domain (R1.3). The main challenge 
regarding the reusability of the data is therefore to make sure that any FAIR resource includes such a 
“plurality of accurate and relevant attributes” (R1) to support data reuse. In the findability use case, these 
attributes could furthermore be used to improve search results by automatically prioritizing relevant, 
trustable results that the requester is legally able to use for his specific purpose. We note that non-technical 
developments are of influence as well: a positive example is the recent adoption of the GDPR [33], which 
is increasingly cited as motivation for works capturing and modeling data usage conditions and constraints 
[20]. Such works are important precursors for convergence toward broadly accepted and generically 
applicable metadata standards for data use and access constraints that have yet to emerge. Finally, 
communities themselves need to identify, develop and promote the required metadata standards and 
metadata registry services play an important role toward convergence within and across domain boundaries. 


®  https://schema.org/. 
® https://toolbox.google.com/data setsearch. 
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Registries may range from full-featured, generic solutions like FAIRsharing®, to relatively simple community 
recommendation lists [34, 35, 36]. 


7. CONCLUSIONS 


In this paper we have shown that there are many ongoing efforts that directly or indirectly contribute to 
the objective of making FAIR a reality. We have shown that these tools contribute to an ecosystem of FAIR 
tooling that covers everything from FAIR data management planning, to production, publication, evaluation, 
finding and (re)using FAIR data. Some of these tools contribute to the design and development of (components 
of) FAIR infrastructures and platforms, while others address a solution to a very specific FAIR challenge. In 
most cases there are a number of alternative solutions with some overlapping, but also many complementary 
features. Moreover, almost all of these efforts have dependencies on, or reach full potential only in 
combination with other FAIR tools and resources. e.g., FAIRifiers are typically more effective with the 
availability of registries of (community adopted) FAIR data models and metadata standards, FAIR search 
and accessibility services cannot work without descriptions of usage and license conditions, etc. In our 
opinion this signals a creolization phase [4] of FAIR tool development. In the near future we will likely see 
an increase in the number of available FAIR tools, while simultaneously these tools will evolve, converge 
and merge in ways that cannot currently be foreseen. Periodically checking alignment with the original aim 
and intention of the FAIR principles will help to converge such efforts toward the realization of mature FAIR 
tool ecosystems and infrastructures, FAIR-based domain-specific applications like the Personal Health Train® 
[37] and the generic Internet of FAIR data and Services [38]. 
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